CS 3240: Languages and Computation - PowerPoint PPT Presentation

1 / 62
About This Presentation
Title:

CS 3240: Languages and Computation

Description:

Let b equal the longest right-hand side of any rule (assume b 1) ... Assume we have a scanner from which we get a token and match it by calling. matchToken(token) ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 63
Provided by: ble87
Category:

less

Transcript and Presenter's Notes

Title: CS 3240: Languages and Computation


1
CS 3240 Languages and Computation
  • Non-Context-Free Languages andRecursive Decent
    Parsing

2
Recall Pumping Lemma for Regular Languages
  • For every regular language L, there is a finite
    pumping length p, such that for any string s?L
    and s?p, we can write sxyz with1) x yi z ?
    L for every i?0,1,2,2) y ? 13) xy ? p

3
Pumping Lemma for CFL
Theorem For every context-free language L, there
is a pumping length p, such that for any string
s?L and s?p, we can write suvxyz with1) u vi
x yi z ? L for every i?0,1,2,2) vy ? 13)
vxy ? p Note that 1) implies that uxz ? L
(take i0), requirement2) says that v,y cannot
be the empty strings e and condition 3) is
useful in proving non-CFL.
4
Pumping a Parse Tree
S
A
A
u
v
x
y
z
If s uvxyz ? L is long, then its parse-tree is
tall.Hence, there is a path on which a variable
A repeats itself. We can pump this AA part.
5
uvxyz ?L
S
A
A
u
v
x
y
z
By repeating the AA part we get
6
uv2xy2z ?L
S
A
A
A
R
y
v
u
x
z
y
x
v
while removing the AA gives
7
uxz ? L
S
A
u
z
x
In general uvixyiz ? L for all i0,1,2,
8
Finding pumping length of a CFL
  • Let b equal the longest right-hand side of any
    rule (assume b gt 1)
  • Each node in the parse tree has at most b
    children
  • At most bh nodes are h steps from the start node
  • Let p equal bV2, where V is the number of
    variables (could be huge!)
  • Tree height is at least V2

9
Formal Proof of Pumping Lemma
Let G be the grammar of a CFL with b?2 the
maximum size of rules in A ? X1Xb. Consider
smallest trees, such that s requires a tree-depth
of at least logbs. If s ? pbV2, then the
tree-depth ? V2, hence thereis a path and
variable A where A repeats itself S ? uAz ?
uvAyz ? uvxyzIt follows that uvixyiz ? L for
all i0,1,2,.. Furthermore vy ? 1 because
tree is minimal vxy ? p because every tree
with ? p leaves has a repeating path
10
Using Pumping Lemma
  • Proof by contradiction. Assume that L is context
    free.Let p be the pumping length of L and take
    s?L.
  • By the pumping lemma it is possible to write
    suvxyz(with vy ? 1 and vxy ? p) such that
    uvixyiz ? L for all i?0.
  • However
  • Thus the pumping lemma does not hold for s.
    Contradiction. The language L is not
    context-free.

Here you have to be clever in picking the right s.
Show that pumping up or down s gives strings that
are not in L. Be careful to consider all v,x,y
possibilities.
11
Example I Pumping anbncn
  • Proof by contradiction assume B anbncn n?0
    is CF.
  • According to the pumping lemma there is a pumping
    length p such that for s apbpcp ? B, we can
    write s uvxyz apbpcp, with uvixyiz ? B for
    all i?0.
  • Two options for 1 ? vxy ? p
  • 1) vxy ab, then the string uv2xy2z has not
    enough letters c, hence uv2xy2z?C
  • 2) vxy bc, then the string uv2xy2z has not
    enough letters a, hence uv2xy2z?C
  • Contradiction the pumping lemma does not hold,
    hence B is not context free.

12
Example II (Pumping down)
  • Prove that C aibjck 0?i?j?k is not
    context-free.
  • Let p be the pumping length, and s apbpcp ? C.
  • Pumping lemma s uvxyz, such that uvixyiz ? C
    for every i?0. Two options for 1 ? vxy ? p
  • 1) vxy ab, then the string uv2xy2z has not
    enough letters c, hence uv2xy2z?C
  • 2) vxy bc, then the string uv0xy0z uxz has
    too many letters a, hence uv0xy0z?C
  • Contradiction The pumping lemma does not hold
    forthis string apbpcp, hence C is not
    context-free.

13
Example III ww
  • Prove that the language D ww w?0,1 is
    not CF.
  • You must be careful when picking the strings s?D
  • Let p be the pumping length, take s0p1p0p1p.
  • Options for suvxyz with 1 ? vxy ? p
  • If vxy is to the left of the middle of 0p1p
    0p1p,then second half of uv2xy2z starts with a
    1.
  • If vxy is to the right of the middle of 0p1p
    0p1p, then first half of hence uv2xy2z ends with
    a 0.
  • If x is in the middle of 0p1p 0p1p, then pumped
    down uxz equals 0p1i 0j1p ? D (because i or j lt
    p)

14
Final Note on CFL
  • Let A1 and A2 be two context-free languages,then
    the union A1 ? A2 is also context free.
  • However, the intersection A1 ? A2 is not
    necessarily context free
  • The complement A1 ?A1 are not necessarily
    context free either

15
Where We Are Now?
  • So far Automata and languages
  • Regular languages
  • Context-free languages
  • Next few weeks Construction of parsers
  • Top-down parsing
  • Bottom-up parsing
  • Semantic analysis
  • Later in the semester Computability theory

16
Parser Classification
  • Parsers are broadly broken down into
  • LL - Top-down parsers
  • L - Scan left to right
  • L - Traces leftmost derivation of input string
  • LR - Bottom-up parsers
  • L - Scan left to right
  • R - Traces rightmost derivation of input string
  • LL is a subset of LR
  • Typical notation
  • LL(0), LL(1), LR(1), LR(k)
  • Number (k) refers to maximum look ahead
  • Lower is better!

17
Recursive Descent Parsing
  • Simple top-down parsing technique for
    hand-written parsers
  • Convert every nontrivial variable into a function
  • Assume we have a scanner from which we get a
    token and match it by calling
  • matchToken(token)which consumes the token if
    matching, or report error if nonmatching. We also
    need
  • peekToken() to get current token without
    consuming it
  • Output is abstract syntax tree

18
A Familiar Example
  • ltexprgt ltexprgt ltaddopgt lttermgt lttermgt
  • lttermgt lttermgt ltfactorgt ltfactorgt
  • ltfactorgt '(' ltexprgt ')' num id
  • ltaddopgt -
  • Notation is called Backus-Naur form (BNF)
  • num and id are terminal symbols, supplied by
    scanner
  • How to apply recursive descent to it

19
Problem
  • Grammar contains left-recursive productions
  • Not suitable for top-down parsing, as it may run
    into infinite loop

ltexprgt ltexprgt ltaddopgt lttermgt lttermgt lttermgt
lttermgt ltfactorgt ltfactorgt ltfactorgt
'(' ltexprgt ')' num id ltaddopgt -
20
Extended Backus-Naur Form
  • For simple cases, one solution is EBNF
  • Uses notation to indicate 0 or more
  • ltexprgt lttermgt ltaddopgt lttermgt
  • Concept is similar to operator of regexp
  • Num 0-90-9

head
tail
21
EBNF Back to BNF
  • ltexprgt lttermgt lte_tailgt
  • lte_tailgt ltaddopgt lttermgt lte_tailgt ?

Example 123
22
Continued
  • lttermgt lttermgt ltfactorgt ltfactorgt
  • EBNF
  • lttermgt ltfactorgt ltfactorgt
  • BNF
  • lttermgt ltfactorgt ltt_tailgt
  • ltt_tailgt ltfactorgt ltt_tailgt ?
  • Now top down parsing will work!

23
Revised Grammar Rules
  • ltexprgt lttermgt lte_tailgt
  • lte_tailgt ltaddopgt lttermgt lte_tailgt ?
  • lttermgt ltfactorgt ltt_tailgt
  • ltt_tailgt ltfactorgt ltt_tailgt ?
  • ltfactorgt '(' ltexprgt ')' num id
  • ltaddopgt -

24
Solution from EBNF Nonrecursive Version
  • Map tail to a loop
  • ltaddopgt was mapped to token matching

enum PLUS, MINUS, MULT, LPAREN, RPAREN, NUM,
ID void expr() term() int token while
( (token peekToken()) PLUS token
MINUS) matchToken(token) term()
ltexprgt lttermgt ltaddopgt lttermgt ltaddopgt
-
25
Solution from BNF Recursive Version
  • enum PLUS, MINUS, MULT, LPAREN, RPAREN, NUM,
    ID
  • void expr()
  • term()
  • e_tail()

ltexprgt lttermgt lte_tailgt lte_tailgt ltaddopgt
lttermgt lte_tailgt ? ltaddopgt -
26
  • void e_tail()
  • int token
  • if ( (tokenpeekToken()) PLUS
  • token MINUS)
  • matchToken( token)
  • term()
  • e_tail()
  • else
  • return

ltexprgt lttermgt lte_tailgt lte_tailgt ltaddopgt
lttermgt lte_tailgt ? ltaddopgt -
27
  • void term(void)
  • factor()
  • t_tail()
  • void t_tail()
  • if ( peekToken() MULTI)
  • matchToken(MULTI) term() t_tail()
  • else
  • return

lttermgt ltfactorgt ltt_tailgt ltt_tailgt
ltfactorgt ltt_tailgt ?
28
  • void factor()
  • if ( peekToken() LPAREN)
  • matchToken(LPAREN) expr()
  • matchToken(RPAREN)
  • else if (peekToken() NUM)
  • matchToken(NUM)
  • else if (peekToken() ID)
  • matchToken(ID)

ltfactorgt '(' ltexprgt ')' num id
29
expr
1 2 3
30
expr
term
1 2 3
31
expr
term
factor
1 2 3
32
expr
term
factor
Finds num
1 2 3
33
expr
term
Success
factor
1 2 3
34
expr
term
Success
factor
t_tail
Finds nothing!
1 2 3
35
expr
term
Success
Success
factor
t_tail
1 2 3
36
expr
Success
term
1 2 3
37
expr
Success
term
e_tail
1 2 3
38
expr
Success
term
e_tail
Finds
1 2 3
39
expr
Success
term
e_tail
Finds
term
1 2 3
40
expr
Success
term
e_tail
Finds
term
factor
1 2 3
41
expr
Success
term
e_tail
Finds
term
factor
Finds num
1 2 3
42
expr
Success
term
e_tail
Finds
term
factor
1 2 3
43
expr
Success
term
e_tail
Finds
term
Success
factor
1 2 3
44
expr
Success
term
e_tail
Finds
term
Success
factor
t_tail
1 2 3
45
expr
Success
term
e_tail
Finds
term
Success
factor
t_tail
Finds
1 2 3
46
expr
Success
term
e_tail
Finds
term
Success
factor
t_tail
Finds
factor
1 2 3
47
expr
Success
term
e_tail
Finds
term
Success
factor
t_tail
Finds
factor
Finds num
1 2 3
48
expr
Success
term
e_tail
Finds
term
Success
factor
t_tail
Finds
factor
t_tail
Finds nothing
1 2 3
49
expr
Success
term
e_tail
Finds
term
Success
factor
t_tail
Success
Success
Finds
factor
t_tail
1 2 3
50
expr
Success
term
e_tail
Finds
term
Success
Success
factor
t_tail
1 2 3
51
expr
Success
term
e_tail
Success
Finds
term
1 2 3
52
expr
Success
Success
term
e_tail
1 2 3
53
expr
Success
1 2 3
54
What happened?
ltexprgt
lttermgt
lte_tailgt
ltaddopgt
lttermgt
lte_tailgt
ltfactorgt
ltt-tailgt
?
ltfactorgt
ltt-tailgt
?
num
ltfactorgt
ltt_tailgt
num
num
?

1

2
3
55
More on Left Recursion
  • If a grammar is left recursive we must first
    rewrite it to make it right recursive
  • Case 1 Simple immediate left recursion
  • A ? A u v where v does not start with A
  • Change to A ? v A A ? u A ?
  • Example Change
  • exp ? exp addop term
  • to
  • exp ? term exp
  • exp ? addop term exp ?

56
More General Case
  • General Immediate Left Recursion
  • A ? Au1 Au2 ... Aun v1 v2 ...
    vmwhere vi does not start with A
  • Example
  • exp ? exp term exp - term
  • Solution
  • A ? v1A v2A ... vmA
  • A? u1A u2A ... unA ?
  • Example
  • exp ? term exp
  • exp ? term exp - term exp ?

57
Indirect Left Recursion
  • A ? Au1 Au2 ... Aun v1 v2 ...
    vmwhere vi ? Aw for some vi
  • Example A ? Ba Aa c B ? Bb Ab d
  • Solution
  • For each rule Ai ? Ajv for jlti with Aj ? w1 w2
    ... wk, replace the former rule byAi ? w1v
    w2v ... wkv assuming Aj has no immediate
    recursion

58
Example
  • Example A ? Ba Aa c B ? Bb Ab d
  • It can be rewritten to A ? BaA' cA A ?
    aA ?
  • Then substitute A into RHS of B B ? Bb BaAb
    cAb d
  • Finally, remove left recursion in B B ? cAbB
    dB B? bB aAbB ?

59
Left Factoring
  • Required if two or more grammar rule choices
    share a common prefix string A ? uv uw
  • Would cause difficulties if we look ahead only
    one token
  • Solution A ? uA A? v w

60
Problem Left Association?
  • Can we still maintain left-association?
  • The parse tree of
  • ltexprgt lttermgt ltaddopgt lttermgt
  • is right-association
  • One solution introduce temporary variables
  • int expr()
  • int temp term()
  • int token
  • while ( (token peekToken()) PLUS
    token MINUS)
  • matchToken(token)
  • if ( tokenPLUS) temp term()
  • else temp - term()
  • return temp

61
Construct Syntax Tree
  • SyntaxTree expr()
  • SyntaxTree temp term()
  • int token
  • while ( (token peekToken()) PLUS
    token MINUS)
  • matchToken(token)
  • SyntaxTree tree makeOpNode( token)
  • tree-gtleftChild temp
  • tree-gtrightChild term()
  • temp tree
  • return temp
  • Question How to construct the tree in the
    recursive version?

62
Summary
  • Change grammar to remove left-recursion
  • Tail becomes a loop, or
  • Map tail into a new rule
  • Convert each nontrivial variable into a function
    call
  • Left-association by introducing temporary
    variables or constructing syntax tree
  • Limitations of recursive descent
  • What if multiple options in RHS start with
    variables?
  • Empty-string production A ? ?
Write a Comment
User Comments (0)
About PowerShow.com